Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures

Manycore architectures – hundreds to thousands of cores per processor – are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks. A critical prerequisite for an efficient runtime is a scalable synchro...

متن کامل

Efficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures

Performance of manycore processors is limited by programs’ use of off-chip main memory. Streaming computation organized in a pipeline limits accesses to main memory to tasks at boundaries of the pipeline to read or write to main memory. The Single Chip Cloud computer (SCC) offers 48 cores linked by a highspeed on-chip network, and allows the implementation of such on-chip pipelined technique. W...

متن کامل

Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures

Based on the premise that preconditioners needed for scientific computing are not only required to be robust in the numerical sense, but also scalable for up to thousands of light-weight cores, we argue that this two-fold goal is achieved for the recently developed self-adaptive multi-elimination preconditioner. For this purpose, we revise the underlying idea and analyze the performance of impl...

متن کامل

Approximate weighted matching on emerging manycore and multithreaded architectures

Graph matching is a prototypical combinatorial problem with many applications in high performance scientific computing. Optimal algorithms for computing matchings are challenging to parallelize. Approximation algorithms are amenable to parallelization and are therefore important to compute matchings for large scale problems. Approximation algorithms also generate nearly optimal solutions that a...

متن کامل

Temperature-Aware Amdahl’s Law for Manycore Architectures

Small cores provide greater throughput per unit area and per watt when sufficient concurrency is available, motivating organizations with many simple cores. However, sufficient concurrency is often not available; even applications that can use many cores often have serial parts. Amdahl’s Law favors an asymmetric architecture and shows that one or more large, high-ILP cores are needed in these c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2018

ISSN: 1045-9219

DOI: 10.1109/tpds.2017.2755655